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Abstract. We present in this paper two different classes of general A"-splitting algorithms for solving finite-dimensional 
convex optimization problems. Under the assumption that the function being minimized has a Lipschitz continuous gradient, 
we prove that the number of iterations needed by the first class of algorithms to obtain an e-optimal solution is 0(l/e). The 
algorithms in the second class are accelerated versions of those in the first class, where the complexity result is improved to 
0(1/ y/e) while the computational effort required at each iteration is almost unchanged. To the best of our knowledge, the 
complexity results presented in this paper are the first ones of this type that have been given for splitting and alternating 
direction type methods. Moreover, all algorithms proposed in this paper are parallelizable, which makes them particularly 
attractive for solving certain large-scale problems. 
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1. Introduction. Many convex optimization problems that arise in practice take the form of a sum of 
convex functions. Often one function is an energy that one wants to minimize and the other functions are 
rcgularization terms to make the solution have certain properties. For example, Tikhonov regularization |28j 
is usually applied to ill-conditioned inverse problems to make them well-posed, compressed sensing [H[5] uses 
l\ rcgularization to obtain sparse solutions, and problems arising from medical imaging adopt both £i and 
total variation (TV) as rcgularization terms [20j . In this paper, we propose and analyze splitting/alternating 
direction algorithms for solving the following convex optimization problem: 

K 

(1.1) min F(x) = Y,M*)> 

where fi : M™ — > M.,i = 1, . . . ,K, are convex functions. When the functions f.^s are well-structured, a well 
established way to solve problem is to split the variable x into K variables by introducing K — 1 new 
variables and then apply an augmented Lagrangian method to solve the resulting problem. Decomposition of 
the augmented Lagrangian function can then be accomplished by applying an alternating direction method 
(ADM) to minimize it. 

Problem (jl.lj) is closely related to the following inclusion problem: 

(1.2) QzTi{x)+--- + T K {x), 

where T\, . . . , Tk are set- valued maximal monotone operators. The goal of problem (|1.2[) is to find a zero of 
the sum of K maximal monotone operators. Note that the optimality conditions for are 

K 
i=l 
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hence, these conditions can be satisfied by solving a problem of the form (|1.2p . 

In the extensive literature on splitting and ADM algorithms, the case K = 2 predominates. The algo- 
rithms for solving (|1.2p when K = 2 are usually based on operator splitting techniques. Important operator 
splitting algorithms include the Douglas- Rachford [pj [TOJ [7] , Peaceman-Rachford [55J , double- backward [5] 
and forward-backward class JT3l [29] of algorithms. Alternating direction methods (ADM) within an aug- 
mented Lagrangian framework for solving are optimization analogs/variants of the Douglas- Rachford 
and Peaceman-Rachford splitting methods. These algorithms have been studied extensively for the case of 
K = 2, and were first proposed in the 1970s for solving optimization problems arising from numerical PDE 
problems [Ml [15] . We refer to [16] and the references therein for more information on splitting and ADM 
algorithms for the case of K = 2. 

Although there is an extensive literature on operator splitting methods, very few convergence results have 
been published on methods for finding a zero of a sum of more than two maximal monotone operators. The 
principal exceptions, are the Jacobi-like method of Spingarn [27] and more recently, the general projective 
splitting methods of Eckstein and Svaiter [TT] . The algorithm addressed in [57] first reduces problem (|1.2[) 
to the sum of two maximal monotone operators by defining new subspaces and operators, and then applies 
a Douglas-Rachford splitting algorithm to solve the new problem. The projective splitting methods in |11] 
do not reduce problem (| 1 . 2|) to the case K = 2. Instead, by using the concept of an extended solution set, 
it is shown in [TT] that solving (|1.2[) is equivalent to finding a point in the extended solution set, and a 
separator-projection algorithm is given to do this. 

Global convergence results for variable splitting ADMs and operator splitting algorithms for the case of 
K = 2 have been proved under various assumptions. However, except for the fairly recently proposed gradient 
methods in |25j and related iterative shrinkage/thresholding algorithms in [5J and the alternating linearization 
methods in |16j . complexity bounds for these methods had not been established. These complexity results 
are extensions of the seminal results of Nesterov [55] [53] , who first showed that certain first-order methods 
that he proposed could obtain an e-optimal solution of a smooth convex programming problem in 0(1/ y/e) 
iterations. Moreover, he showed that his methods were optimal in the sense that this iteration complexity 
was the best that could be obtained using only first-order information. Nesterov's optimal gradient methods 
are accelerated gradient methods that use a combination of previous points to compute the new point at 
each iteration. By combining these methods with smoothing techniques, optimal complexity results were 
obtained for solving nonsmooth problems in [53] [30J . 

In this paper, we propose two classes of multiple variable-splitting algorithms based on alternating 
direction and alternating linearization techniques that can solve problem (|1.1[) for general K (K > 2) and we 
present complexity results for them. (Note that the complexity results in [25] [5] [16] are only for problem 
(jl.ip when K — 2). The algorithms in the first class can be viewed as alternating linearization methods 
in the sense that at each iteration these algorithms perform K minimizations of an approximation to the 
original objective function F by keeping one of the functions fi(x) unchanged and linearizing the other 
K — 1 functions. An alternating linearization method for minimizing the sum of two convex functions was 
studied by Kiwiel et al. [IS]. However, our algorithms differ greatly from the one in [T5] in the way that the 
proximal terms are chosen. Moreover, our algorithms are more general as they can solve general problems 
with K(K > 2) functions. Furthermore, we prove that the iteration complexity of this class of splitting 
algorithms is 0(1/ e) for an e-optimal solution. To the best of our knowledge, this is the first complexity 
result of this type for splitting/alternating direction type algorithms. The algorithms in our second class are 
accelerated versions of the algorithms in our first class and have 0(1/ \f£) iteration complexities. This class 
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of splitting algorithms is also new as are the complexity results. 

Our new algorithms have, in addition, several practical advantages. First, they are all parallelizable. 
Thus, although at each iteration we solve K subproblems, the CPU time required should be approximately 
equal to the time required to solve the most difficult of the subproblems if we have K processors that can work 
in parallel. Second, since every function /j is minimized once at each iteration, it is likely that our algorithms 
will need fewer iterations to converge than operator splitting algorithms such as FPC [TTII19j .TVCMRI [20] . 
ISTA and FISTA [2J. The numerical results in [I] for the case of K = 2 support this conclusion. 

The rest of this paper is organized as follows. In Section[2jwe propose a class of splitting algorithms based 
on alternating direction and alternating linearization methods for solving (jl.ip and prove that they require 
0(1 /e) iterations to obtain an e-optimal solution. In Section [3] we propose accelerated splitting algorithms 
for solving (|1.1[) and prove they have 0(1/ y/e) complexities. We discuss how to apply our algorithms for 
solving nonsmooth problems by using smoothing techniques in Section 0J Numerical results are presented 
in Section [5] Finally, we summarize our results in Section [6] 

2. A class of multiple splitting algorithms. By introducing new variables, i.e., splitting variable x 
into K different variables, problem (jl.ip can be rewritten as: 

K 

min > fi(x l ) 

2.1 ' 

s.t. x i = x i+1 ,i = 1,..., K- 1. 

In Sections 2 and 3, we focus on splitting and ADM algorithms for solving (|2.1[) and their complexity results. 
We make the following assumptions throughout Sections 2 and 3. 
Assumption 2.1. 

• fi(-) : M" —}M.,i = l,...,Kisa smooth convex function of the type C • , i.e. continuously differen- 
tiable with Lipschitz continuous gradient: 

||V/i(x) - Vfi(y)\\ < L(fi)\\x - y\\,Mx, y G R", 

where L(fi) is the Lipschitz constant. 

• Problem (jl.ip is solvable, i.e., X* := argmini 7, ^ 0. 
We define the term e-optimal as follows. 

Definition 2.2. Suppose x* is an optimal solution to the following problem 

(2.2) mm{/(x) : x £ C}. 

x G C is called an e-optimal solution to (|2.2p if f(x) — f(x*) < e holds. 
The following notation is adopted throughout Sections 2 and 3. 

Definition 2.3. We define fi(u,v) as the linear approximation to fi(u) at a point v plus a proximal 
term: 

fi(u,v) := fi(v) + (Wfi(v),u- v) + ±-\\u- v\\ 2 , 



where [i is a penalty parameter. We use Qi(u 1 , . . . , v % 1 , u, v t+1 , . . . , v ) to denote the following approxima- 
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tion to the function F(u): 



K 

Q i (v\...,v i -\u,v i+ \...,v K ):=f i (u)+ J2 fay)* 

i.e., Qi is an approximation to the junction F, where the i-th function /j is unchanged but the other functions 
are approximated by a linear term plus a proximal term. We use Pi(v l , . . . , v 1 ^ 1 , v l+1 , . . . , v K ) to denote the 
minimizer of Q^V 1 , . . . , u, v l+1 , . . . , v ) with respect to u, i.e., 



(2.3) Pi (v\ v l ~\v t+1 , v K ) := argmin Qi(v\. . . ,v 



^\uy+\...,v K ) 



With the above notation, we have the following lemma which follows from a fundamental property of a 
smooth function in the class C 1 ' 1 ; see e.g., [3]. 

Lemma 2.4. For fi defined as in Definition \2.S\ and /i < 1/ m&xi<i<K L(fi), we have for i = 1, . . . , K , 

fa) < My) + (Vfi(y), x-y) + ^-\\x - y\\ 2 < fa, y),Vx,y e R n . 



The following key lemma is crucial for the proofs of our complexity results. Our proofs of this lemma 
and most of the results that follow in this and the remaining sections of the paper closely follow proofs given 
in [5] for related lemmas and theorems. 

Lemma 2.5. For any i = 1, . . . , K, u,v x ,..., v l+1 , . . . , v K eR™ and \x < 1/ maxi^^ L(fi), we 
have, 



K 

(2.4) 2^(F(u)-F(p))> Y, (lb' 



u\\ 2 - \W -u\\ 2 ) 



where p := p^v 1 , . . . , v l 1 , v l+1 , . . . , v ). 

Proof. From Lemma 12.41 we know that F(p) < Q^v 1 , . . . , i> I-1 ,p, v l+1 , . . . , v ) holds for all i and 
v 1 , . . . , v'- 1 , v i+1 , ...,v K £ R n . Thus, for any u e M" we have, 

(2.5) - F(p) > F(u) - Qi(v\ . . . , v l -\p, v i+1 ,. ..,v K ) 

K 

E 

A' 



= /<(«) - /<(p) + E (/,-(«) - f,(vt) - (vfriv^p- + yi w 

> (VMp),u-p) + J2 (<V/i(^),«- « j ) - <V/ 3 -(^),p-^'> + ^|| 2 ) 



K ( 1 

= <V/ f (p),«-p)+ X! (< v / 3 V),«-p)+2-||p-« J '|| a 
= E ( \ "(p - v r ),u-p\ - 2~l|p- ^|| 2 

where the second inequality is due to the convexity of the functions fj,j = 1 , . . . , K and the last equality is 
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from the first-order optimality conditions for problem (|2.3p . i.e., 

(2.6) V/i(p)+ E + =0- 
Then using the identity 

(2.7) ||a- C || 2 H|fe- C || 2 = ||a-&|| 2 + 2(a-M- C ), 
we get the following inequality: 

2^(F(u) -*-(?))> J] ((-2(p-vi),u-p)-\\p-vi\\ 2 ) 

= E (lb-^U 2 -IK'-«ll 2 )- 

□ 

Our multiple splitting algorithms (MSA) for solving (|2.1j) are outlined in Algorithm [IJ where 6 
jjA'xk j g a (jo^iy stochastic matrix, i.e., 

/c--»-xx i-E^,: i. •.•;../ i K . 

3 = 1 t=l 

One natural choice of DW is to take all of its components equal to 1/K. In this case, all w % n.yi = 1, . ■ . ,K 

Algorithm 1: A Class of Multiple Splitting Algorithms (MSA) 
Set x = .tJ 0) = . . . = xf 0) = w 1 ^ = . . . = wfo and /Lt < 1/ maxi<i<x L(fi). 
for k = 0, 1, ■ • ■ do 

• for each i = 1, . . . , K, compute x^ k+1 ^ := pi(w 1 ^ w^), 

• compute 



are equal to X^i-Li x^/K, i.e., the average of the current K iterates. 

At iteration fc, Algorithm [T] computes K points sri fc \ , i = 1, . . . , K by solving X subproblcms. For many 
problems in practice, these K subproblcms arc expected to be very easy to solve. Another advantage of the 
algorithm is that it is parallelizable since given w l ^,i = l,...,K, the K subproblcms in Algorithm [T] can 
be solved simultaneously. Algorithm (JlJ can be viewed as an alternating linearization method since at each 
iteration, K subproblems are solved, and each subproblcm corresponds to minimizing a function involving 
linear approximations to some of the functions. Although Algorithm [1] assumes the Lipschitz constants are 
known, and hence that fi is known, this assumption can be relaxed by using the backtracking technique in 
[2] to estimate at each iteration. 

We prove in the following that the number of iterations needed by Algorithm Q] to obtain an e-optimal 
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solution is 0(l/e). 



(2.8) 



Theorem 2.6. Suppose x* is an optimal solution to problem (|2.1[) . For any choice of y, < 1/ maxi<i</f L(fi), 
sequence fe fc j,itilj}^.j generated by Algorithm^ satisfies: 

(K-i)\\ xo -x*r 



< 



2/ifc 



Thus, the sequence {miHt=i 1 .., J ji p ffen)} produced by Algorithm^ converges to F(x*). Moreover, if fj, > 
/3/ maxi{L(/i)} where < /3 < 1, i/ie number of iterations needed to obtain an e-optimal solution is at most 
\C/e] , where C 



{K-l)me,-Ki{L{fi)}\\x -x*\\ 2 
20 



Proof. In (|2.4p. by letting 



u = x* .v = w l - 



n ),j = 1, • • - ,-K", J 7^ «, we have p = < +1 and 



(2.9) 



2 A( (F(a:*)- J F(a ; l ( „ +1) )) > £ 



Nn+D-^ll'-Hn)- 1 *!! 2 



(^-^(Nn+D-^^-Hn)- 1 *" 2 ) 



Using the definition of w!, in Algorithm [TJ we have 



A" 



(2.10) 



E K>+1) - 



**l| 2 = 



E 

i=l 
A' 

E 



A 



5=1 
A 



(™+i)„j 

X (n+1) X 



ji ^ X (n+1) 



X*) 



A A 



<££^ + V 



(n+l) 



i=l j=l 

Eli-? 



n+l) 



where the second and the last equalities are due to the fact that is a doubly stochastic matrix and 

the inequality is due to the convexity of the function || • j| 2 . 



Thus by summing (|2.9p over i — 1 , . . . , K wc obtain 
(2.11) 



2M [KF(x*) J2 F(x\ n+1) )\ >{K-1) [J2 H«+i) - ^ll 2 - E H») 

V i=l / \i=l i=l 



A 



A 



> (k - 1) e H»+d - ^n 2 - E n^(-) - 



where the last inequality is due to (|2.10l) . 



Summing (|2.11[) over n = 0, 1, . . . , k — 1, and using the fact that wfo = Xo, i = 1, . . . , K, yields 

(fc-l k \ K 

kKF(x*) - E_E ) > ~ 1) E (K) " *T " US) - ^H 2 ) 



n— i— 1 / z— 1 

»*l|2 



> -K{K- l)||x -a;* 



In by letting u = v j = w l {n) ,j = 1, . . . , K,j ^ i, we get p = x l (n+1) and 

K 

(2-13) 2 M (F( W | n) )-F(^ n+1) )) > E Nn+i) -^)l| 2 >0. 

From the way we compute w 1 ^ ,i = 1, . . . , K, and the facts that F is convex and D^ n > is a doubly stochastic 
matrix, we get 



( 2 -i 4 ) E F H l )) = E^(E^. 

i=l i=l \j=l 

K K 

E F ( x («)) 



Now summing f|2 . 13[) over i = 1 , . . . , K and using (|2.14p , we obtain 
(2-15) 2 M ( E F(x\ n] ) - e n^n+i))) > °- 

\i=l i=l / 

This shows that the sums ^( x \ n )) are non-increasing as n increases. Hence, 

fe-i if if 

(2.i6) EE^H-m))^^^))- 

n— i— 1 i— 1 

Finally, combining (f2TT2|) and (f2TT6|) yields 



2^ ^AF(x*) - fcE F Hfe))) ^ - 1)11^0 ~ x*\ 



Hence, 



„^ (A' -IJIIxo-a;*!! 



2/xfc 



It then follows that min j=li ... iJr Ffc^) - F(o;*) < e, if fc > [C/e] , where C = {K ~ 1] ^dL(h)}\\x -x* || ^ 
and hence that for any k > |~C/e] , := argmin{a;^ fc ^ |-F(a;^ fc ^), i = 1, . . . , A'} is an e-optimal solution. □ 
Remark 2.7. // in the original problem (jl.lj) . a; is subject to a convex constraint x £ C, where C is 
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a convex set, we can impose this constraint in every subproblem in MSA and obtain the same complexity 
result. The only changes in the proof are in Lemma \2. 5\ If there is a constraint x £ C, then (|2.4j) and (|2.5j) 
hold for any u £ C and the last equality in (|2.5p becomes a "> " inequality due to the fact that the optimality 
conditions (|2.6[) become 

^V/ l (p)+ 2 (v/^) + i(p-^')) ,u-p^ >0,VueC. 

Unfortunately, this extension is not very practical, since for it to be useful, adding the constraint in every 
subproblem would most likely make most of these subproblems difficult to solve. 



3. A class of fast multiple splitting algorithms. In this section, we give a class of fast multiple 
splitting algorithms (FaMSA) for solving problem (|2.1[) that require at most 0(1/ y/e) iterations to obtain 
an e-optimal solution while requiring a computational effort at each iteration that is roughly the same as 
Algorithm [TJ Our fast multiple splitting algorithms are outlined in Algorithm [2] where € R KxK is a 
doubly stochastic matrix. 



Algorithm 2: A Class of Fast Multiple Splitting Algorithms (FaMSA) 
Set xq = x\ Q s = w l ( Q j = = 1, • • • , K, t\ = 1, and choose fj, < 1/ maxi<j</<- L{fi) 

for k = 1, 2, ■ • ■ do 

• for i = 1, . . . ,K, compute x\ k ^ = pi(w\ k y. . . , wfy) 

• compute (w\ kV ...,wf k) y.= [x\ k) , ...,xf k ^j £>(*) 

• compute tk+i = (1 + v/1 + 4tJ)/2 

• for i = 1, . . . , K, compute w\ k+1) := w\ k) + ^ (t k (x\ k) - w| fc _ 1} ) - (w l {k) - . 



To establish the 0(1/ \fe) iteration complexity of FaMSA, we need the following lemma. 

Lemma 3.1. Suppose x* is an optimal solution to problem (|2.ip . For any choice of n < maxi<i<K L{fi), 
the sequence {%\ k yW\ k yW\ k yif=\ generated by Algorithm^ satisfies: 

K 

(3-1) 2»(t 2 k v k - t 2 k+1 v k+1 ) >(K- (114+iH 2 - Hf) . 

i=i 

where v k ■= Yh=i F ( x \ k )) ~ KF(x*) and u\ := t k x^ k) - (t k - l)w\ k _^ - x* ,i = 1, . . . , K. 

Proof. In (j2~l|) . by letting u = w^^v 3 = w l {k+1 yj = 1, . . . , K,j ^ i, we get p = x^ k+1) and 

K 

(3-2) 2»(F{w\ k) )-F{x\ k+l) j) > Yl (ll^ + i)-^ fe )ll 2 -|l^ + i)-*( fe )ll 2 ) 

= (K 1) (\\x\ k+1) - ^ fe) || 2 - ||^ fc+1) - w\ k) f) . 

Summing p.2p over i = 1, . . . , K, and using the facts that F is convex and is a doubly stochastic 
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matrix, we get 



2 M (X>(^ fc) ) - f^F(x\ k+1) ) j > 2 M (X>(^ fe) ) ~ E F (^+i)) ) 

\i=l i=l / \i=l i=l ) 

>(K-i)f^(\\x\ 



(fc+1) - W'(fc)ll 2 - Hfc+1) - ™(fc)ll 2 ) . 



i.e., 

A 

(3.3) 2fi(v k - v k+1 ) > (K - l)J2 



In (|2.4|) , by letting u = a;*,?;- 5 = W( fc+1 ), we get p = x % , k+r ^ and 

K 



\ x \k+i) ~ x *ll 2 ~~ ll w (fc+i) ~~ • E *!l 2 



(3-4) 2 M (F(x*)- J FV (fe+1) )) > 2 

= (i^-i)(||4 +1) - a; *|| 2 -|K (fc+1) - a; *|| 2 ). 

Summing (|3.4p over i = 1 , . . . , K we obtain 



A" 



(3.5) -2(iv k+1 >{K-l)J2 



i=i 



k( fc+1 )-^ii 2 -H fc+ i)-^ii 2 



Now multiplying (|3.3[) by t 2 and (|3.5p by ijt+i, adding the resulting two inequalities, using the relation 
t k = tfc+i (tfe+i — l)j and the identity (|2.7p . we get 

(3.6) 2/i(i^ fc - tl +1 v k+ i) 

K 

>{K-i)j^t k+x {t k+1 i) (n4 +1) - u,yi 2 - \\w\ k+1) -w\ k) f) 



A" 



(A'-l)E^i 



i=l 

A 



k( fc+ i) -z*ll 2 - lk( fc+ i) -^*ll 2 



= ( A ' _ l)5^*fc+l(*AH-l - 1) (ll' T (fc+l) - w (fe+l)H 2 + 2 ( x (fc+l) ~ W(jfe + l)>W(fc+l) ~ «>(*) 



i=l 

A 



c^- i )Zl* fc + i (ii x (fe+i) ^ w (fc+i)ii 2 + 2(^+1) -^+1)^^+1) 



i=l 

A 



=(* - 1) Yl (tl+Mk+i) - ^ fc+1) || 2 + 2t k+1 (x\ k+1) - w\ k+1) ,t k+1 w\ k+1) - (i fc+1 - l)tS| fc) - z*)) 



i=i 

A' 



i=l 



< K ~ !) S (H^+i^fc+i) " ( tfc +! " 1 ) tS (*) _ X *H 2 _ ll^+i^fc+i) ~ (*fc+i ~ !)<&(*) - 



From the way we compute w^ k+1 j in Algorithm |21 i.e., 



^(fc+i) — ■ tfc 

it follows that 



^+iW( fe+ i) - (tk+i - l)^( fc ) - = tfcai^j - - 



Thus, from (|3.6|) and the definition of it follows that 



if 

i=X 
if 

=(^-i)^(ik + iii 2 -ikn 2 )- 

This completes the proof. □ 



Before proving our main complexity theorem to Algorithm [2] we note that the sequence {tk} generated 
by Algorithm [5] clearly satisfies tk+i > tk + |, and hence tk > (k + l)/2 for all k > 1 since = 1. 



Theorem 3.2. Suppose x* is an optimal solution to problem (|2.1[) . For any choice of \x < maxi<^<A' L(fi), 
\k)> w (fc)'^(fe)J 



£/ie sequence Wwi^ftii^mlti generated by Algorithm^ satisfies 



(3.7) min ^-F^^ ^fe^ - 

V ' i=X,...,if V W y V ' + l) 2 

Thus, the sequence {mini=i ) ... ) if F(x/j.^)} produced by Algorithm^ converges to F(x*). Moreover, if fx > 
/3/ maxj{L(/j)} where < /3 < 1, ifte number of iterations needed to obtain an e-optimal solution is at most 
[_-JCje\, where C = 2(K - 1) maXi{L(fi)}\\x - x*\\ 2 /f3. 



Proof. By rewriting p.l[) as 



if if 
2fxt 2 k+1 v k+1 + (K - 1) £ ||4 +1 || 2 < 2/xiK + (-K" - 1) E H^ll 2 - 
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we get 



(^) 2 v h < 2iit\v k + (K - 1) J2 
^ ' i=l 

K 



i=l 
K 

= 2^v 1 + {K-l)Y J Hi)-^\\ 2 

K 

<(K-l)J2Hx)-x*\\ 



*||2 

\\' w (l) - * 

i=l 

*||2 



= 1)11 x - x* 

where the first inequality is due to t k > (k + l)/2, the first equality is from the facts that t\ = 1 and 

u i = x \i) 

1,...,K. 



u\ = x\-, — x* , the third inequality is from letting k = in Q3.5P and the last equality is due to w 1 /^ — xo,i 



Thus, from v k = J2? =1 F(x\ k) ) - KF(x*) we get 



(k)) 

z— 1 v ' 



which implies that 



1 A ' 

min^ F(x\ k) ) - < - ^Ti^) ~ F ( x *) ^ 



2(K-l)\\x - x*f 

Mfc + i) 2 



i.e., (|3J|) holds. 

Moreover, it follows that if C/(fc + l) 2 < e, i.e., k > l^/Cje\, then minj = i K F(x\ k) ) - < 

e, where C = 2(K — 1) maxi{L(/i)}||xo — x*\\ 2 /(3. This implies that for any k > [y/C/e\ , a;^,) := 
argmin{a^ fc -) |.F(x^), i = 1, . . . , K} is an e-optimal solution. □ 

REMARK 3.3. Although we have assumed that the Lipschitz constants L(fi) are known, and hence that fi 
is chosen in Algorithm\^to be smaller than 1/ m&xi<i<K{L(fi)} , this can be relaxed by using the backtracking 
technique in JJ^j that chooses a fi at each iteration that is smaller than the \i used at the previous iteration 
and for which F(p) < Q ? :(V (fc) , . . . , wfa , p, w^ k) , . . -,wfa) for all i. 

3.1. A variant of the fast multiple splitting algorithm. In this section, we present a variant of the 
fast multiple splitting algorithm (Algorithmic]) that is much more efficient and requires much less memory 
than Algorithm [2] for problems in which K is large. This variant uses := l/Kee T , where e is the 

n-dimensional vector with all ones, and replaces xWs in the last line of Algorithm [2] by w l ^ k y i.e., in the last 
line of Algorithmic we compute w l ^ k+1 j for i = 1, . . . , K by the formula: 



it - 1 



(fc+1) := w\ k) + -j^Hk) ~ %-!))■ 

It is easy to see that in this variant, the ,i = 1, ■ ■ ■ ,K are all the same and the W % , k+1 ^ , i = 1, . . . , K are 
all the same. We call this variant FaMSA-s, where s refers to the fact that this variant computes a "single" 
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vector w k and a single vector w(k+i) at the fc-th iteration. It is given below as Algorithm |3J 



Algorithm 3: A variant of FaMSA (FaMSA-s) 
Set xo = xl s = W(o) = W(i),i = 1, • • • , K, ti = 1, and choose p. < 1/ maxi<i<A _ L(fi) 
for k = 1, 2, • • • do 

• for i = 1, . . . , K, compute x % {k) = Pi(w( k ),. . . , 

• compute %) := ^ E ^i ^( fc) 

• compute t fc+ i = (1 + y/l + 4tJ)/2 

• for i = 1,. . compute w (fc+1) := + |^(u>(fc) - «>(fe-i))- 



It is easy to verify that the following analog of Lemma [37X1 applies to Algorithm FaMSA-s. 
Lemma 3.4. Suppose x* is an optimal solution to problem (|2.1[) . The sequence {wn^jW^)} generated 
by Algorithm FaMSA-s satisfies: 

K 

2p(t 2 k v k - t 2 k+1 v k+1 ) > (K - 1) J~J (ll^+ill 2 - ll^ll 2 ) > 

i=l 

where v k := K(F(w^) - F(x*)) and u k := t k w {k) - (t k - l)w^k-i) - x* . 

Proof. The proof is very similar to the proof of Lemma |3. 11 hence, we leave it to the reader. The main 
difference is that instead of using the inequality J2i=i ^(^(fe)) — Si=i F( x \k)) to re pl ace tne sum involving 
w^ k y we use the fact that KF{w k +i) < J2i=i ^( x \k+i)) *° re P^ ace the sum involving in the proof. □ 

From Lemma \3A\ Theorem 13.21 with and w\ k y respectively, for i = 1,...,K replaced by W(k) an d 
w/k) follows immediately for FaMSA-s. 

4. Multiple splitting algorithms for nonsmooth problems. Although for the above results we 
required all functions to be in the class of C ' , our algorithms can still be applied to solve nonsmooth 
problems by first smoothing all nonsmooth functions. One of the most important smoothing techniques is the 
one proposed by Nesterov |24j . We use the i^-norm function as an example to show how Nestcrov's smoothing 
technique works. Note that the l\ function f(x) := \\x\\i can be rewritten as ||x||i = max{(x,w) : u € U}, 
where U := {u : ||w||oo _■ 1}- Since U is a bounded convex set, we can define a prox-function d(u) for the 
set U, where d(u) is continuous and strongly convex on U with convexity parameter a > 0. For U defined 
as above, a natural choice for d(u) is d(u) := 1| and thus a = 1. Hence, we have the following smooth 
approximation for f(x) = \\x\\i: 

f p (x) := max{(a;, u) — pd(u) : u G U}, 

where p is a positive smoothness parameter. It can be shown that f p (x) is well defined and is in the class of 
C 1 ' 1 and its gradient is Lipschitz continuous with constant L p = (see Theorem 1 in [24]). Also, it is easy 
to show that the following relations hold for f(x) and f P (x): 

f P (x) < f(x) < f p (x) + pD, 

where D := max{d(u) : u <E U}. Therefore, to get an e-optimal solution to a problem involving the l\- 

u 

norm function f(x), we can replace f{x) with (x) to get a smooth problem, and then apply our splitting 
algorithms to the new problem to get an ^-optimal solution, which will be e-optimal to the original nonsmooth 
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problem. Since is 0(1/ 1), our fast 0(1/ ^fe) algorithms require 0(1/ e) iterations to compute an e-optimal 
solution. 

For nonsmooth problems in imaging, data analysis, and machine learning, etc. with regularization terms 
that involve total variation and the nuclear norm, we can use similar smoothing techniques to smooth these 
nonsmooth functions, and then apply our multiple splitting algorithms to solve them. 



5. Numerical experiments. We present some preliminary numerical experiments in this section. 
Specifically, we apply our MSA and FaMSA algorithms to solve the Fermat- Weber problem and a total 
variation and wavelet based image deblurring problem. All numerical experiments were run in MATLAB 
7.3.0 on a Dell Precision 670 workstation with an Intel Xcon(TM) 3.4GHZ CPU and 6GB of RAM. 



5.1. The Fermat-Weber problem. The Fermat- Weber (F-W) problem can be cast as: 

K 

(5.1) mm.F(x) = y^\\x-c% 



where c 1 £ M. n , i = 1, . . . , K are K given points. Problem (|5.1[) can be reformulated as a second-order cone 
programming (SOCP) problem and thus solved in polynomial time by an interior-point method. Since there 
are K cones, the size of a standard form SOCP formulation for this problem is quite large for large K and 
n. Since fi(x) = 1 1 x — c l j | , i = 1 , . . . , K are not smooth, to apply our MSA and FaMSA algorithms, we need 
to smooth them first. Here we adopt the smoothing technique discussed in section [4j we approximate fi(x) 
by the smooth function 

(5-2) f[(x) := max{ (x-c*,y) - ^\\yf : ||y|| < 1}, 

where p > is a smoothness parameter. The gradient of ff , V/f (x) = y* , where y* is the optimal solution to 
the optimization problem in (|5.2I) . It is easy to show that y* = max {p|^_ e ; || } ■ Moreover, V/f(a;) is Lipschitz 
continuous with constant L(f[) = 1/p. Now we can apply MSA, FaMSA and FaMSA-s to solve 

K 

(5.3) min ]T/f(x). 

i=l 

The i-th subproblem in all of these algorithms corresponds to solving the following problem: 

K ( 1 
(5A) Pl (w\ k) , • • ■ , w\ k) ) := argmin/f (u) + £ tf(w\ k) ) + (Vf?(w{ k) ),u- w\ k) ) + -\\u- w\ k) \ 

It is easy to check that the optimal solution to problem (|5.4|) is given by 

. (K -l)\\z] k) -c%- p . . . . 

- + iK-rZ\ k) -c% - c>) ' if - "» > " + ^ 
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where 



i _ i M \ - w (fc) c 

Hk) W{k) K - 1 max{p, \\w\ k) cf\\}' 

If we choose the doubly stochastic matrix to be := l/Kee T in MSA as we do in FaMSA-s, 
all W/ fc -)'s are the same in MSA as they are in FaMSA-s. Hence, computing xi k \, for i = 1, . . . , K in both 
algorithms can be done efficiently as follows. 



(5.5) 



Z(k) - Ylj=l max{p,j|iu (ie) -^||} 
^(fc) = C * + ( X ~ ma X {(if-l)||z' fc) -c«||, /i +p(/Y-l)})( 2: (fc) ~ C *)> Vi = 1 ' ' " ' ' K 



(k) - -J^( z , i " nrr ),Vt = 1,...,K 



We compared the performance of MSA and FaMSA-s with the classical gradient method (Grad) and 
Nesterov's accelerated gradient method (Nest) for solving (|5.3[) . The classical gradient method for solving 
(|5.3p with step size r > is: 



x k+1 = x k 



r£V/jV). 

j'=i 



The variant of Nesterov's accelerated gradient method that we used is the following: 

| x k = ^-rEjrV//^- 1 ) 
| yk _ x fe _ |^i(x fe - a; fe_1 ). 

We created random problems to test the performance of MSA, FaMSA-s, Grad and Nest as follows. 
Vectors c l £ R™, i = 1, . . . , K were created with i.i.d. Gaussian entries from Af(0, n). The seed for generating 
random numbers in MATLAB was set to 0. We set the smoothness parameter p equal to 10~ 3 . The 
initial points x\i = 1,...,K were set to the average of all of the c"s, i.e., xl Q -. = J^iLi c ' '■ We chose 
D\y = 1/K,i,j = 1,...,K for all k in MSA. To compare the number of iterations needed by MSA and 
FaMSA-s, we first solved (|5.ip by Mosek [3T] after converting it into an SOCP problem to get the optimal 
solution x* , and then terminated MSA, FaMSA-s, Grad and Nest when the relative error of the objective 
function value at the k-th iterate, 

\mm i=1 ,..., K F(x\ k) ) - F(x*)\ 
relerr := 



F(x*) 

was less than 10 -6 . We tested the performance of these four solvers for different choices of r, which is the 
step size for Grad and Nest. Note that since the w/'s are the same in MSA with D^ k > = j^ee T for all k 
and in FaMSA-s, these two methods can be viewed as linearization methods in which the single function 
^2f=i j^i fj( x ) i s linearized at the point w with only one proximal term K ~ 1 1 1 x — w\ in the i-th subproblem. 
So the step size for MSA and FaMSA-s is p/{K — 1). Hence, the parameter p for MSA and FaMSA-s was 
set to p = t(K — 1) in our numerical tests. 

Our results are presented in Table I5TT1 The CPU times reported are in seconds. These results show 
that for the F-W problem, our implementations of MSA and FaMSA-s take roughly between two and three 
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times as much time to solve each problem as taken by Grad and Nest, respectively. This is not surprising 
since it is clear that the computation of each set of K vectors z| fe \ and x 1 ,^ for i — 1,...,K in (|5.5j) is 
roughly comparable to a single computation of the gradient, i.e., the K gradients of f[(x), for i = 1, . . . , K. 
Moreover, for the simple F-W objective function, not much is gained by minimizing only one out of the K 
individual functions /f (x), i = 1,...,K, when K is large as it is in our tests. Note that the number of 
iterations required by MSA and Grad were exactly the same on our set of test problems. When K is of a 
moderate size and the individual functions are more complicated, MSA should require fewer iterations than 
Grad. 



Table 5.1 

Comparison of MSA, FaMSA-v, Grad and Nest on solving Fermat-Weber problem 1 15. 31 1 



Problem 


Mosek 


MSA 


FaMSA-s 


Grad 


Nest 


n 


K 


time 


iter 


relerr 


time 


iter 


relerr 


time 


iter 


relerr 


time 


iter 


relerr 


time 


t = 0.001 


50 


50 


0.85 


500 


4.1e-05 


0.73 


107 


8.4e-07 


0.17 


500 


4.1e-05 


0.21 


109 


9.0e-07 


0.05 


50 


100 


3.40 


500 


5.6e-06 


1.44 


69 


9.9e-07 


0.21 


500 


5.6e-06 


0.42 


72 


8.5e-07 


0.07 


50 


200 


0.96 


427 


9.9e-07 


2.44 


47 


8.8e-07 


0.28 


427 


9.9e-07 


0.69 


49 


8.6e-07 


0.09 


100 


100 


1.78 


500 


8.9e-06 


1.68 


94 


9.8e-07 


0.33 


500 


8.9e-06 


0.51 


97 


9.1e-07 


0.10 


100 


200 


4.48 


500 


1.6e-06 


3.35 


60 


9.2e-07 


0.42 


500 


1.6e-06 


1.00 


62 


9.2e-07 


0.13 


100 


400 


9.40 


198 


1.0e-06 


2.68 


34 


9.5e-07 


0.47 


198 


1.0e-06 


0.79 


36 


9.2e-07 


0.15 


200 


200 


22.22 


500 


2.3e-06 


4.36 


75 


9.9e-07 


0.67 


500 


2.3e-06 


1.39 


77 


9.9e-07 


0.22 


200 


400 


45.55 


275 


1.0e-06 


4.81 


41 


9.9e-07 


0.73 


275 


1.0e-06 


1.54 


43 


9.8e-07 


0.25 


200 


800 


100.15 


41 


1.0e-06 


1.44 


15 


9.7e-07 


0.54 


41 


1.0e-06 


0.46 


16 


9.8e-07 


0.18 


300 


300 


102.64 


419 


1.0e-06 


6.73 


52 


9.9e-07 


0.85 


419 


1.0e-06 


2.22 


54 


9.9e-07 


0.29 


300 


600 


194.99 


24 


1.0e-06 


0.79 


11 


9.9e-07 


0.37 


24 


1.0e-06 


0.26 


12 


9.9e-07 


0.14 


300 


1200 


401.54 


1 


5.8e-07 


0.08 


1 


5.8e-07 


0.08 


1 


5.8e-07 


0.03 


1 


5.8e-07 


0.03 


r = 0.01 


50 


50 


0.84 


238 


9.9e-07 


0.36 


32 


8.3e-07 


0.06 


238 


9.9e-07 


0.11 


34 


7.5e-07 


0.02 


50 


100 


3.36 


93 


9.9e-07 


0.29 


20 


9.6e-07 


0.07 


93 


9.9e-07 


0.08 


22 


7.6e-07 


0.03 


50 


200 


0.96 


42 


9.9e-07 


0.26 


13 


9.2e-07 


0.09 


42 


9.9e-07 


0.07 


15 


5.9e-07 


0.03 


100 


100 


1.78 


160 


1.0e-06 


0.55 


28 


9.0e-07 


0.11 


160 


1.0e-06 


0.17 


30 


8.1e-07 


0.04 


100 


200 


4.48 


62 


9.8e-07 


0.43 


17 


9.2e-07 


0.13 


62 


9.8e-07 


0.13 


19 


7.5e-07 


0.05 


100 


400 


9.46 


20 


9.5e-07 


0.28 


9 


9.1e-07 


0.13 


20 


9.5e-07 


0.09 


10 


9.2e-07 


0.05 


200 


200 


22.37 


91 


1.0e-06 


0.81 


22 


9.2e-07 


0.21 


91 


1.0e-06 


0.26 


23 


1.0e-06 


0.07 


200 


400 


45.56 


28 


9.7e-07 


0.50 


11 


9.9e-07 


0.21 


28 


9.7e-07 


0.16 


13 


8.4e-07 


0.08 


200 


800 


99.38 


4 


1.0e-06 


0.16 


4 


8.6e-07 


0.16 


4 


1.0e-06 


0.05 


4 


9.4e-07 


0.05 


300 


300 


100.48 


42 


9.9e-07 


0.69 


15 


9.3e-07 


0.26 


42 


9.9e-07 


0.23 


16 


9.5e-07 


0.09 


300 


600 


194.88 


3 


9.7e-07 


0.11 


3 


9.4e-07 


0.11 


3 


9.7e-07 


0.04 


3 


9.6e-07 


0.04 


300 


1200 


402.16 


1 


5.4e-07 


0.08 


1 


5.4e-07 


0.08 


1 


5.4e-07 


0.03 


1 


5.4e-07 


0.03 


r = 0.1 


50 


50 


0.84 


23 


9.5e-07 


0.05 


9 


3.4e-07 


0.03 


23 


9.4e-07 


0.02 


10 


5.4e-07 


0.01 


50 


100 


3.41 


9 


7.7e-07 


0.04 


5 


6.1e-07 


0.03 


9 


7.7e-07 


0.02 


6 


5.1e-07 


0.01 


50 


200 


0.95 


4 


5.3e-07 


0.04 


3 


2.9e-07 


0.03 


4 


5.2e-07 


0.01 


3 


1.0e-06 


0.01 


100 


100 


1.80 


16 


8.6e-07 


0.07 


8 


3.6e-07 


0.04 


16 


8.6e-07 


0.02 


9 


4.1e-07 


0.02 


100 


200 


4.48 


6 


8.3e-07 


0.05 


4 


7.2e-07 


0.04 


6 


8.3e-07 


0.02 


5 


5.2e-07 


0.02 


100 


400 


9.40 


2 


6.4e-07 


0.04 


2 


4.2e-07 


0.04 


2 


6.4e-07 


0.02 


2 


6.4e-07 


0.02 


200 


200 


22.25 


9 


9.4e-07 


0.09 


6 


5.8e-07 


0.07 


9 


9.3e-07 


0.03 


6 


9.4e-07 


0.02 


200 


400 


45.61 


3 


7.9e-07 


0.07 


3 


5.0e-07 


0.07 


3 


7.9e-07 


0.02 


3 


6.9e-07 


0.03 


200 


800 


99.77 


1 


5.0e-07 


0.05 


1 


5.0e-07 


0.05 


1 


5.0e-07 


0.02 


1 


5.0e-07 


0.02 


300 


300 


100.37 


4 


9.9e-07 


0.08 


4 


6.7e-07 


0.08 


4 


9.9e-07 


0.03 


4 


8.4e-07 


0.03 


300 


600 


197.72 


1 


7.0e-07 


0.05 


1 


7.0e-07 


0.05 


1 


7.0e-07 


0.02 


1 


7.0e-07 


0.02 


300 


1200 


412.49 


1 


2.1e-07 


0.08 


1 


2.1e-07 


0.08 


1 


2.1e-07 


0.03 


1 


2.1e-07 


0.03 
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The purpose of this set of tests was not to demonstrate any advantage that our algorithms might have 
over gradient methods. Rather, they were performed to validate our algorithms and show that the accelerated 
variants like algorithm Nest can reduce the number of iterations required to solve problems of the form (11.11) . 
This is quite clear from the results reported in Table 15.11 We further note that FaMS A-s often takes one to 
three fewer iterations than Nest. Note that for some problems, the multiple splitting algorithm took only 
one iteration to converge. The reason was that for these problems, the number of points was much larger 
than the dimension of the space. Therefore, the points were very compact and fairly uniformly distributed 
around the initial point; hence that point was quite likely to be very close to the optimal solution. 



5.2. An image deblurring problem. In this section, we report the results of applying our multiple 
splitting algorithms to a benchmark total variation and wavelet-based image deblurring problem from [12] , 
In this problem, the original image is the well-known Cameraman image of size 256 x 256 and the observed 
image is obtained after imposing a uniform blur of size 9x9 (denoted by the operator A) and Gaussian noise 
(generated by the function randn in MATLAB with a seed of and a standard deviation of 0.56). Since the 
vector of coefficients of the wavelet transform of the image is sparse in this problem and the total variation 
norm of the image is expected to be small, one can try to reconstruct the image x from the observed image 
b by solving the problem: 

(5.6) min aTV(x) + /?||$a;||i + ^\\Ax - b\\%, 



where TV(x) := y/( x i+i.j x ij) 2 + i x ij x i,j+i) 2 is the total variation of a;, <!> is the wavelet transform, 
A denotes the deblurring kernel and a > 0, (3 > are weighting parameters. Problem (|5.6|) involves 
minimizing the sum of three convex functions with fi(x) = aTV(x), /^(s) = /3||$a;||i and/3(x) = i||Ac— b|||. 

To apply our multiple splitting algorithms to solve (|5.6|) . our theory requires all the functions to be 
smooth functions. So we needed to smooth the TV and the t\ functions first. We adopted the following way 
to smooth the TV function, widely used in the literature for doing this: 



fl(x) := a>Y^ \J ( x t+i.j - Xij) 2 + {Xij - Xi,j+i) 2 + 6. 

y 

The l\ function was smoothed in the way described in Section @] 

f°(x) := /3max{($x, U ) - ^|| U || 2 : \\u\\ x < 1}. 
Thus, the smooth version of problem (|5.6p was: 

(5.7) min f((x) + f°(x) + f 3 (x). 



However, when we applied our multiple splitting algorithms to (|5.7[) . we actually performed the following 
computation on the fc-th iteration: 



x k+i 
k+i 

w k+i 



= argmin, h{x) + (VfZ(w k ),x - w k ) + ±\\x - w k \\ 2 + (Vf 3 (w k ), x - w k ) + ± \\x - w k \\ 2 
= argmi % f^(y) + (Vff(w k ),y - w k ) + ±\\y - w k \\ 2 + (Vf 3 (w k ),y- w k ) + ±\\y - w k \\ 2 
= argmin z / 3 (z) + (Vff{w k ),z- w k ) + ±\\z - w k \\ 2 + (V/f(w fc ), z - w k ) + ±\\z - w k \\ 2 
= (x k+1 +y k+1 +z k+1 )/3. 
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Note that in (|5.8[) . when we linearized the TV function, we used the smoothed TV function ff (•), i.e., we 
computed the gradient of /f(-). But when we solved the first subproblcm, we used the nonsmooth TV 
function /i(-), because there are efficient algorithms for solving this nonsmooth problem. Specifically, this 
subproblem can be reduced to: 

x k+1 := argmin ^TV(z) + ±\\x - (w k - m(V/ 2 <V) + Vf 3 (w k )) /2)|| 2 , 

which is a standard TV-denoising problem. In our tests, we perform 10 iterations of the algorithm proposed 
by Chambolle in [5] to approximately solve this problem. The second subproblem in (|5.8p can be reduced 
to: 

(5.9) y k+1 := argmin ±%(y) + hy- (w k - fx (V/f (w k ) + Vf 3 (w k )) /2)|| 2 . 

y 2 2 

It is easy to check that the solution of (|5.9|) is given by: 

y k+1 := $ T ($w k - ^-w k ^j 

where (w k )j = max{ — 1, min{l, -^fpjj }} and w k = w k — fx (Vff(w k ) + Vf 3 (w k )^j /2. The third subproblem 
in (|5.8[) corresponds to solving the following linear system: 

(A T A + 2/fxI)z = A T b - V/f(w fc ) + 2//V= - Vf s 2 {w k ). 

Solving this linear system is easy since the operator A has a special structure and thus (A T A + 2/fiI) can 
be inverted efficiently 

In our tests, we set a = 0.001, (3 = 0.035 and used smoothing parameters S = a = 10~ 4 . The initial 
points were all set equal to 0. We compared the performance of MSA, FaMSA, FaMSA-s and Grad for 
different /x and step sizes r. In these comparisons, we simply terminated the codes after 500 iterations. The 
objective function value and the improvement signal noise ratio (ISNR) at different iterations are reported 
in Table 1531 The ISNR is defined as ISNR := 101og 10 U^ZgL > where x is the reconstructed image and 
x is the true image. As we did for F-W problem, we always used [i = t(K — 1) and since there were 
three functions in this problem, we used /i = It. For large /x, we did not report the results for all of the 
iterations since the comparisons are quite clear from the selected iterations. See Figure 15.11 for additional 
and more complete comparisons. We make the following observations from Table l5?2l For /j, = 0.1, FaMSA-s 
achieved the best objective function value in about 200 iterations and 152 CPU seconds. The best ISNR 
was also achieved by FaMSA-s, in about 300 iterations and 227 seconds. MSA and Grad were not able to 
obtain an acceptable solution in 500 iterations. In fact, they were only able to reduce the objective function 
to twice the near-optimal value of 3.86 x 10 4 achieved by FaMSA-s. For /j, = 0.5, FaMSA-s achieved the 
best objective function value and ISNR in 100 iterations and 76 seconds and 125 iterations and 94 seconds, 
respectively. Again, MSA and Grad did not achieve acceptable results even after 500 iterations. For // = 1, 
MSA achieved the best objective function value, 3.73 x 10 4 , after 500 iterations and 349 CPU seconds, while 
the best ISNR was achieved by FaMSA-s in 80 iterations and 61 seconds. Also, the best objective function 
value achieved by FaMSA-s was at the 60-th iteration after only 47 CPU seconds. We also note that for 
/i = 0.1, 0.5 and 1, MSA was always better than Grad and FaMSA-s was always slightly better than FaMSA. 
Another observation was that MSA always decreased the objective function value for [i = 0.1,0.5 and 1, 
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while FaMSA and FaMSA-s always achieved near-optimal results in a relatively small number of iterations 
and then started getting worse. However, in practice, one would always terminate FaMSA and FaMSA-s 
once the objective function value started increasing. For fj, = 5, MSA gave very good results while the other 
three solvers diverged immediately. Specifically, the best objective function value 3.73 x 10 4 was achieved by 
MSA in 120 iterations and 80 CPU seconds, and the best ISNR was achieved by MSA in 200 iterations and 
132 CPU seconds. Thus, based on these observations, we conclude that FaMSA-s attains a nearly optimal 
solution very quickly for small [i while MSA is more stable for large fi. 

We also plotted some figures to graphically illustrate the performance of these solvers. Figures (a), (b) 
and (c) in Figure HTT1 plot the objective function value versus the iteration number for fi = 0.1,0.5 and 1, 
respectively Figures (d), (e) and (f) in Figure ISTTl plot ISNR versus the iteration number for fi = 0.1,0.5 
and 1. We did not plot graphs for \i = 5, since FaMSA, FaMSA-s and Grad diverged from the very first 
iteration. From Figure 15.11 we can see the comparisons clearly. Basically, these figures show that FaMSA 
and FaMSA-s achieve a nearly optimal solution very quickly. We can also see from (b), (c), (e) and (f) that 
FaMSA-s is always slightly better than FaMSA and MSA is always better than Grad. 

We also tested setting to the identity matrix in MSA and FaMSA, but this choice, as expected, did 
not give as good results. 

To see how MSA performed for the deblurring problem (|5.7j) . we show the original (a), blurred (b) and 
reconstructed (c) cameraman images in Figure [5~2| The reconstructed image (c) is the one that was obtained 
by applying MSA with fi = 5 after 200 iterations. The ISNR of the reconstructed image is 5.3182. From 
Figure 15.21 we see that MSA was able to recover the blurred image very well. 

6. Conclusions. In this paper, we proposed two classes of multiple splitting algorithms based on alter- 
nating directions and optimal gradient techniques for minimizing the sum of K convex functions. Complexity 
bounds on the number of iterations required to obtain an e-optimal solution for these algorithms were de- 
rived. Our algorithms are all parallelizable, which is attractive for practical applications involving large-scale 
optimization problems. 

Acknowledgement. We would like to thank the anonymous referee for making several very helpful 
suggestions. 
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Table 5.2 

Comparison of MSA, FaMSA, FaMSA-s and Grad on solving TV-deblurring problem 





MSA 


FaMSA 


FaMSA-s 


Grad 


Iter 


obj ISNR 


obj ISNR 


obj ISNR 


obj ISNR 


At = 0.1, t = 0.05 


100 


3.42e+005 0.9311 


4.67e+004 3.6310 


4.66e+004 3.6332 


3.36e+005 0.9344 


200 


1.55e+005 1.5340 


3.89e+004 4.9693 


3.86e+004 4.9821 


1.55e+005 1.5341 


300 


1.13e+005 1.9057 


3.98e+004 5.2695 


3.94e+004 5.2989 


1.13e+005 1.9043 


400 


9.25e+004 2.1905 


4.30e+004 4.6587 


4.26e+004 4.7075 


9.28e+004 2.1871 


500 


7.97e+004 2.4235 


4.76e+004 3.3881 


4.70e+004 3.4500 


8.02e+004 2.4175 


At = 0.5, t = 0.25 


25 


2.41e+005 1.1343 


7.70e+004 2.4777 


7.69e+004 2.4784 


2.36e+005 1.1408 


50 


1.29e+005 1.7359 


4.31e+004 3.9343 


4.28e+004 3.9416 


1.29e+005 1.7376 


75 


9.66e+004 2.1260 


3.92e+004 4.7122 


3.88e+004 4.7324 


9.67e+004 2.1250 


100 


7.96e+004 2.4243 


3.90e+004 5.1257 


3.84e+004 5.1638 


8.00c+004 2.4198 


125 


6.92e+004 2.6659 


3.97e+004 5.2558 


3.90e+004 5.3160 


6.98e+004 2.6569 


150 


6.21e+004 2.8682 


4.12e+004 5.0880 


4.04e+004 5.1737 


6.30o+004 2.8538 


175 


5.71e+004 3.0416 


4.33e+004 4.6478 


4.23e+004 4.7576 


5.82e+004 3.0207 


200 


5.34e+004 3.1928 


4.58e+004 3.9964 


4.46e+004 4.1258 


5.48e+004 3.1646 


225 


5.06e+004 3.3266 


4.86c+004 3.2006 


4.73e+004 3.3442 


5.22e+004 3.2902 


250 


4.85e+004 3.4463 


5.18e+004 2.3223 


5.03e+004 2.4758 


5.03e+004 3.4009 


275 


4.67e+004 3.5545 


5.54e+004 1.4132 


5.37e+004 1.5723 


4.88e+004 3.4991 


300 


4.54e+004 3.6529 


5.93e+004 0.5078 


5.74e+004 0.6705 


4.76c+004 3.5869 


500 


3.99e+004 4.2186 


9.74e+004 -5.2730 


9.43e+004 -5.1193 


4.38e+004 4.0416 


fi = 1, t = 0.5 


20 


1.53e+005 1.5382 


6.35c+004 2.7991 


6.33e+004 2.8006 


1.51e+005 1.5443 


40 


9.23e+004 2.1927 


4.10c+004 4.2214 


4.05e+004 4.2361 


9.22e+004 2.1932 


60 


7.09e+004 2.6220 


3.91e+004 4.9205 


3.84e+004 4.9591 


7.13e+004 2.6158 


80 


5.99e+004 2.9413 


3.96c+004 5.2175 


3.86e+004 5.2890 


6.08e+004 2.9258 


100 


5.34e+004 3.1933 


4.10e+004 5.1371 


3.98e+004 5.2488 


5.47e+004 3.1664 


120 


4.93e+004 3.4003 


4.33e+004 4.6922 


4.19e+004 4.8439 


5.10e+004 3.3597 


140 


4.64e+004 3.5751 


4.62e+004 3.9649 


4.45e+004 4.1489 


4.85e+004 3.5186 


160 


4.44e+004 3.7258 


4.94e+004 3.0524 


4.75e+004 3.2595 


4.68e+004 3.6515 


180 


4.29e+004 3.8578 


5.32e+004 2.0449 


5.10e+004 2.2668 


4.57e+004 3.7637 


200 


4.18e+004 3.9748 


5.74e+004 1.0116 


5.50e+004 1.2419 


4.49e+004 3.8592 


220 


4.09e+004 4.0795 


6.20e+004 -0.0045 


5.93e+004 0.2311 


4.44c+004 3.9407 


240 


4.02e+004 4.1741 


6.70e+004 -0.9780 


6.41e+004 -0.7394 


4.40e+004 4.0103 


260 


3.96e+004 4.2602 


7.22c+004 -1.8951 


6.90e+004 -1.6561 


4.37e+004 4.0695 


280 


3.92e+004 4.3388 


7.77e+004 -2.7506 


7.43e+004 -2.5136 


4.36e+004 4.1197 


300 


3.88e+004 4.4111 


8.34e+004 -3.5436 


7.97e+004 -3.3102 


4.35e+004 4.1620 


500 


3.73e+004 4.9042 


1.35c+005 -8.5246 


1.29e+005 -8.3127 


4.47e+004 4.2742 


At = 5,r = 2.5 


20 


2.54e+007 -2.7911 


1.10e+023 -157.9048 


8.53e+018 -116.7985 


5.63e+015 -84.9895 


40 


4.91e+005 3.7130 


1.37e+040 -328.8532 


8.05e+031 -246.5444 


6.03c+022 -155.2917 


60 


4.69e+004 4.4065 


3.59e+057 -503.0389 


1.68e+045 -379.7479 


6.55e+029 -225.6465 


80 


3.80e+004 4.6991 


1.29e+075 -678.5934 


4.93e+058 -514.4122 


7.15e+036 -296.0278 


100 


3.74e+004 4.9027 


5.53e+092 -854.9100 


1.74e+072 -649.8897 


7.84e+043 -366.4253 


120 


3.73e+004 5.0513 


2.65e+110 -1031.7135 


6.92e+085 -785.8864 


8.61e+050 -436.8334 


140 


3.73e+004 5.1600 


1.37e+128 -1208.8552 


2.99e+099 -922.2437 


9.47e+057 -507.2490 


160 


3.74e+004 5.2373 


7.52e+145 -1386.2455 


1.37e+113 -1058.8660 


1.04e+065 -577.6699 


180 


3.76e+004 5.2888 


4.31e+163 -1563.8262 


6.62e+126 -1195.6913 


1.15e+072 -648.0947 


200 


3.78e+004 5.3182 


2.55e+181 -1741.5574 


3.31e+140 -1332.6769 


1.27e+079 -718.5224 


500 


4.27e+004 4.4523 


Inf -Inf 


Inf -Inf 


5.70e+184 -1775.0426 
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